15.2 Data structures


// 2024-05-04-wk15-02-data-structures.md

A data structure is a data representation and a list of operations that are possible for that particular data structure (way of organizing data).

All of the Python data types are powerful data structures, under the hood.

  • integers
  • floating point numbers
  • strings
  • lists
  • tuples
  • dictionaries
  • sets

For each of these data types (and any Python object) you can learn what they can do by checking out their attributes and methods. Even the lowely int data type has some interesting methods that you have been using without even knowing it.

In this module 15.2 you will learn about a few different data structures that will come in handy for your future Python work. And you will be able to use one of the data structures that you learn about to make your data-driven text adventure a lot easier to understand.

The int data structure

Because it is so simple, you probably have never thought about an integer as a data structure. But there is a lot of magic going on under the hood within the Python int data type.

>>> one = 1
>>> negative_one = -1
>>> zero = one + negative_one
>>> print(zero)
0
>>> dir(one)
['__abs__', '__add__', '__and__', ...]
>>> dir(3)
['__abs__', '__add__', '__and__', ...]

If you haven’t seen a function or variable name like __abs__ before, it may look a little strange to put a double underscore at the beginning of a function or variable. These are “hidden” methods. Python hides them from you, but because Python is open source, you can always “pop the hood” to see how great programmers designed these data structures using the built-in dir function. Or you can just print out the __dict__ attribute that is available on most objects in Python.

At work you may hear people call these “dunder methods.” When you’re doing a podcast or trying to tell someone what to type on their keyboard “dunder” is a lot easier to say that “double underscore.”

Can you guess what the int.__abs__() operation does?

>>> one.__abs__()
1
>>> negative_one
-1
>>> negative_one.__abs__()
1

What about the other two dunder methods, __add__ and __and__?

>>> one.__add__(negative_one)
0
>>> one + negative_one
0

The __add__ method is what is called a “binary operator.” It needs two values to do it’s work. The __abs__ method works with only one value and is called a “unary operator.”

What about the __and__ method. Is that a binary or a unary operator? Where have you seen the word and in Python before?

>>> bool(one)
True
>>> bool(zero)
False
>>> one and zero
False
>>> one.__and__(zero)
False

These are just the first 3 of the 72 different operations built into a Python int data type. To learn more about these basic data structures, use your IDE (Spyder) to find out how many attributes and methods an dictionary data type has. And see if you can run one of the methods within a dict data structure just like you did for the int data structure.

More powerful data structures

You may not think of Python data types when you hear the term “data structure”. However, in the workplace when people use the term “data structure” they mean an organized way of collecting and maniuplating values collections of data values (objects). A sorted list of integers stored in an array is a common data structure. And the real power of data strucutures becomes aparent once you start nesting simpler container data types, such as lists and dicts within each other. In this section you will learn a bit more about the “graph” data structure, and you can implement it using a dictionary of dictionaries.

Data types and data structures that can be used to hold other data objects are called “containers.” Containers can even contain themselves or other containers, creating a nested data structure. So a data structure is how you use container data types to organize your data so that you can process it for whatever problem you want to solve. The way you design your data structure can make your code extremely complicated or very simple.

Data structures are the fundamental building blocks of any program. A computer science degree usually invoves several courses in data structures and database design. Some of the data structures you will learn about have whole courses and have become whole industries, all by themself. Here are some data structures you have already seen, and some new ones that you may want to use for you future Python data structures.

  • array — list
  • linked list — a list where you can only do a “sequential scan” in the order of the list and cannot skip around (cannot use “random access”)
  • table — list of lists or list of dicts
  • relational databases — tables connected by relationships
  • mapping — dict
  • directed graph — dict of dicts or connection matrix (list of lists) or edge list (list of 2-tuples) or adjacency list
  • undirected graph
  • directed acyclic graph
  • tree — a directed acyclic graph where every child node has only one parent (or every worker node has only one boss)

In the next section you will learn about one special data structure called a graph. This is the data structure I use when building chatbots for customers around the world.

Graph data structure

In computer science, a graph is a data structure containing objects connected to each other in a web or network of relationships. It is also sometimes called a network data structure. For example, the social graph data structure at Facebook (Meta) contains all the users and their connections to each other through “friend” relationships.

On Facebook, the friend relationships are mutual.